Introduction

This blog contains a regional analysis of data collected from a weather station in McCall, Idaho. This data goes back to March, 1906. The data includes daily minimum and maximum temperatures at the station location on the given date.

## 'data.frame':    38990 obs. of  24 variables:
##  $ STATION: Factor w/ 1 level "USC00105708": 1 1 1 1 1 1 1 1 1 1 ...
##  $ NAME   : Factor w/ 1 level "MCCALL, ID US": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DATE   : Factor w/ 38990 levels "1906-03-01","1906-03-02",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ DAPR   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ DASF   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MDPR   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MDSF   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PRCP   : num  0 0 0 3.8 0 0 0 0 0 0 ...
##  $ SNOW   : num  0 0 0 51 0 0 0 0 0 0 ...
##  $ SNWD   : num  1092 1092 1067 1118 1092 ...
##  $ TMAX   : num  2.2 1.1 2.2 2.2 3.9 8.3 8.9 8.3 9.4 6.7 ...
##  $ TMIN   : num  -11.1 -19.4 -10.6 -2.2 -6.7 -9.4 -10.6 -9.4 -7.8 -8.3 ...
##  $ TOBS   : num  -3.9 -2.2 -4.4 1.1 0 3.3 1.7 3.3 6.1 6.1 ...
##  $ WT01   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT03   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT04   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT05   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT06   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT08   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT09   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT11   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT14   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT16   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WT18   : int  NA NA NA NA NA NA NA NA NA NA ...
## [1] NA

Plotting Data

This is the data plotted in graphs. These graphs just show the changes in daily maximum temperature which varies alot from the hot summers to the cold winters in Mccall. I changed the data from daily highs and lows to monthy means.

## 
## Call:
## lm(formula = TMAX ~ NewDate, data = climate_data)
## 
## Coefficients:
## (Intercept)      NewDate  
##   1.243e+01    2.108e-05

## 
## Call:
## lm(formula = TMAX ~ NewDate, data = climate_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.287  -9.448  -1.020   9.864  27.894 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.243e+01  5.698e-02 218.074  < 2e-16 ***
## NewDate     2.108e-05  4.920e-06   4.285 1.83e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.07 on 38037 degrees of freedom
##   (951 observations deleted due to missingness)
## Multiple R-squared:  0.0004824,  Adjusted R-squared:  0.0004562 
## F-statistic: 18.36 on 1 and 38037 DF,  p-value: 1.834e-05
## 'data.frame':    1297 obs. of  5 variables:
##  $ Month: chr  "03" "04" "05" "06" ...
##  $ Year : chr  "1906" "1906" "1906" "1906" ...
##  $ TMAX : num  4.59 12.25 15.22 16.82 28.47 ...
##  $ YEAR : num  1906 1906 1906 1906 1906 ...
##  $ MONTH: num  3 4 5 6 7 8 9 10 11 12 ...

## 
## Call:
## lm(formula = TMAX ~ YEAR, data = MonthlyTMAXMean[MonthlyTMAXMean$Month == 
##     "05", ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1256 -1.8211 -0.2484  1.6681  7.0097 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  7.304444  14.555709   0.502    0.617
## YEAR         0.004603   0.007405   0.622    0.536
## 
## Residual standard error: 2.458 on 105 degrees of freedom
## Multiple R-squared:  0.003667,   Adjusted R-squared:  -0.005822 
## F-statistic: 0.3864 on 1 and 105 DF,  p-value: 0.5355

The P-value for the month of may is .53 which means that the results of the analysis are not statistically significant.

##   Month Year       TMIN YEAR MONTH
## 1    03 1906 -9.4741935 1906     3
## 2    04 1906 -4.2800000 1906     4
## 3    05 1906 -0.8806452 1906     5
## 4    06 1906  1.9033333 1906     6
## 5    07 1906  6.9064516 1906     7
## 6    08 1906  4.3548387 1906     8

## % latex table generated in R 3.6.0 by xtable 1.8-4 package
## % Sun Sep 13 15:22:51 2020
## \begin{table}[ht]
## \centering
## \begin{tabular}{rlllll}
##   \hline
##  & Month & Slope TMIN & R\verb|^|2 & Slope TMAX & R\verb|^|2.1 \\ 
##   \hline
## 1 & January & 0.0429 *** & 0.146 & 0.021 ** & 0.095 \\ 
##   2 & February & 0.0346 *** & 0.121 & 0.0128 NS & 0.029 \\ 
##   3 & March & 0.038 *** & 0.231 & 0.0169 * & 0.048 \\ 
##   4 & April & 0.022 *** & 0.17 & 0.0075 NS & 0.008 \\ 
##   5 & May & 0.018 *** & 0.18 & 0.0046 NS & 0.004 \\ 
##   6 & June & 0.0119 ** & 0.074 & 0.0112 NS & 0.025 \\ 
##   7 & July & 0.0036 NS & 0.004 & 0.0019 NS & 0.001 \\ 
##   8 & August & 0.0179 *** & 0.098 & 0.0173 ** & 0.069 \\ 
##   9 & September & 0.0116 * & 0.042 & 0.0141 NS & 0.035 \\ 
##   10 & October & -7e-04 NS & 0 & 0.0057 NS & 0.005 \\ 
##   11 & November & 0.0103 NS & 0.024 & -0.0031 NS & 0.002 \\ 
##   12 & December & 0.008 NS & 0.008 & -0.0015 NS & 0.001 \\ 
##    \hline
## \end{tabular}
## \end{table}

This graph shows the monthly minimum temperatures in McCall.

Error: Incomplete expression: Results <- data.frame(Month = TMINresult[c(2:13),1], TMINSlope = TMINresult[c(2:13),2], TMIN_P = as.numeric(TMINresult[c(2:13),3]), TMINRsq = TMINresult[c(2:13),4],

#Error in TMAXresult[c(2:13), 2] : incorrect number of dimensions

##Precipitation: Departure from Mean
climate_data$PRCP[climate_data$PRCP==-9999] <- NA
Missing <- aggregate(is.na(climate_data$PRCP),
list(climate_data$Month, climate_data$Year), sum)
# The aggregate command is used to create a simplified dataset. In this case
# we are creating a sum of PRCP based on each month and year.
Missing$Date = as.numeric(Missing$Group.1) + as.numeric(Missing$Group.2)/12
plot(x ~ Date, data=Missing)

This graph analyzes how much precipitation deviated from the mean.

#aggreate by month and year to get monthly totals
#cut out the months that have more than 4 missing days.
TotalPPT <- aggregate(climate_data$PRCP,
list(climate_data$Month, climate_data$Year), sum, na.rm=T)
names(TotalPPT) = c("Group.1", "Group.2", "ppt")
NonMissing <- Missing[Missing$x < 5, c(1:3)]
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
PPT <- merge(TotalPPT, NonMissing, all.y=TRUE)
PPT$Date <- as.numeric(PPT$Group.1) + as.numeric(PPT$Group.2)/12
head(PPT)
##   Group.1 Group.2   ppt x     Date
## 1      01    1907 134.0 0 159.9167
## 2      01    1908  56.7 0 160.0000
## 3      01    1909 224.0 0 160.0833
## 4      01    1910  81.4 0 160.1667
## 5      01    1917  59.9 0 160.7500
## 6      01    1918 110.6 0 160.8333
#Finding the mean
PRCP_mean = mean(PPT$ppt)
plot(ppt~Date, data=PPT)
abline(h=PRCP_mean, col="blue")

The mean kinda looks meaningless because the data is so scattered.

#Looking at a few months code will not run so i put it all in ##
#STATION£PRCP[STATION~PRCP==-9999] <- NA
#YearlySum = aggregate(PRCP ~ Year, NAME, sum)
#YearlySum£YEAR = as.numeric(YearlySum£Year)
#YearlyMean = mean(YearlySum£PRCP)
#plot(PRCP~YEAR, data=YearlySum, las=1, ty="p")
#abline(h=YearlyMean, col="blue")
#YearlySum.lm = lm(PRCP~YEAR, data=YearlySum)
#abline(coef(YearlySum.lm), col="green")
#n <- 5
#k <- rep(1/n, n)
#k
#y_lag <- stats::filter(YearlySum£PRCP, k, sides=1)
#lines(YearlySum~YEAR, y_lag, col="red")
#summary(YearlySum.lm)

This code did not work for me. I received: Error: unexpected input in “STATION�” Error in eval(predvars, data, env) : object ‘YEAR’ not found

par(mfrow=c(2,2))
plot(lm(TMIN ~ YEAR, data=MonthlyTMINMean[MonthlyTMINMean$MONTH==1,]))

These graphs check if the data is following the assumtions for our statistical test. The data looks pretty normal according to the qq graph.